Approximate K Nearest Neighbors in High Dimensions

نویسندگان

  • George Dahl
  • Mary Wootters
چکیده

Given a set P of N points in a ddimensional space, along with a query point q, it is often desirable to find k points of P that are with high probability close to q. This is the Approximate k-NearestNeighbors problem. We present two algorithms for AkNN. Both require O(Nd) preprocessing time. The first algorithm has a query time cost that is O(d+logN), while the second has a query time cost that is O(d). Both algorithms create an undirected graph on the points of P by adding edges to a linked list storing P in Hilbert order. To find approximate nearest neighbors of a query point, both algorithms perform bestfirst search on this graph. The first algorithm uses standard one dimensional indexing structures to find starting points on the graph for this search, whereas the second algorithm using random starting points. Despite the quadratic preprocessing time, our algorithms have the potential to be useful in machine learning applications where the number of query points that need to be processed is large compared to the number of points in P . The linear dependence in d of the preprocessing and query time costs of our algorithms allows them to remain effective even when dealing with high-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Searching Algorithm for Approximate Nearest Neighbor Queries in High Dimensions

In this papel; we present an approximate nearest neighbor search algorithm that use heuristics to decide whether o r not to access a node in the index tree based on three interesting data distribution properties. We demonstrate that the proposed algorithm significantly reduces the number of nodes accessed over the algorithms that have been proposed in earlier works. Also, it will be demonstrate...

متن کامل

Quantitative Analysis of Nearest-Neighbors Search in High-Dimensional Sampling-Based Motion Planning

We quantitatively analyze the performance of exact and approximate nearest-neighbors algorithms on increasingly high-dimensional problems in the context of sampling-based motion planning. We study the impact of the dimension, number of samples, distance metrics, and sampling schemes on the efficiency and accuracy of nearest-neighbors algorithms. Efficiency measures computation time and accuracy...

متن کامل

Implementing a Parallel Dynamic Approximate Nearest Neighbor Search Algorithm∗

We describe the implementation of a fast, dynamic, approximate, nearest-neighbor search algorithm that works well in fixed dimensions (d ≤ 5), based on sorting points coordinates in Morton (or z-) ordering. Our code scales well on multi-core/cpu shared memory systems. Our implementation is competitive with the best approximate nearest neighbor searching codes available on the web, especially fo...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

High-Dimensional Similarity Search Using Data-Sensitive Space Partitioning

Nearest neighbor search has a wide variety of applications. Unfortunately, the majority of search methods do not scale well with dimensionality. Recent efforts have been focused on finding better approximate solutions that improve the locality of data using dimensionality reduction. However, it is possible to preserve the locality of data and find exact nearest neighbors in high dimensions with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008